Extract Numbers From a String 


See Also 


Extracting numbers from text strings, removing unwanted characters , Michael Cleverly, comp.lang.tcl, 
2002-06-23 


An explanation with several examples. 


Description 


The following regular expression matches an optional leading + or -, an optional integer part, an 
optional decimal point, more digits, and an optional trailing exponent. 


[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)? 


The tricky part about this expression is that in the absence of a ., the part of the pattern that normally 
matches the mantissa matches the integer part instead. 


A similar but longer expression takes a different approach to make the the integer portion optional, 
adding an extra branch (|). ( The original version was posted to comp.lang.tcl by Roland B. Roberts.): 
[-+]?(?:[0-9]+(2:\. [O-9]+)?]\. [0-9]+) (2: [eE] [-+]?[0-9]+)? 


When extracting numbers from text, in order to allow separators in significant digits while avoiding 
picking up those separators when they occur elsewhere, a more complex expression is required: 
# uses extended syntax 


set pattern { 
# any initial + or - characters 


[-#]? 
# order of the branches matters 
(?: 
# only significant digits 
[0-9_, ]*[0-9] 
| 
# only mantissa 
\.[0-9]+ 
| 
# the significant digits 
[0-9_, ]*[0-9] 
# the mantissa 
\.[0-9]+ 
) 
# optional exponent 
(?: 


[eE*][-+]?[0-9]+ 
)? 


To add support for ratios, reuse the pattern: 


set rpattern $pattern(?:\s*/\s*$pattern)? 


set text "some, text. +100 . more text. -200 h lL 6.62607015e-34 1,000 xd 
100,000,000.234, and 34. , 1.67262171E-27 .22" 


regexp -inline -all $pattern $text; #-> +100 -200 6.62607015e-34 1,000 
100,000,000.234 34 1.67262171E-27 .22 


More information here . 


WJG 2022-10-01 PYK 2022-10-09: A quick snippet on extracting a list of numbers from a string 
without using regular expressions: 


proc extractNumbers str { 
set res 
foreach c [split $str ""] { 
if { [string is integer $c] } { 
setal 
append res $c 
} elseif { $c eq "," || $c eq "." } { 
if {$a} { append res $c } 
} else { 
set a 0 


append res " " 
} 


return [string trim $res] 


WJG 2022-10-03 PYK 2022-10-09: Made some changes to the above procedure to allow for sub-string 
prefixes (+-) and infixes (.,/‘). Seeing as a numeric sequence could end a clause which would append a 
either a comma or full-stop as sentence punctuation, these are removed from any result. 


proc extractNumbers str { 


set buff "" 

set res "" 

set lc "™ 

set pf "-+" ;# number sequence prefixes 
set if ".,/ A" ;# number sequence infixes 


# parse the string character by character 
foreach c [split $str ""] { 
# respond to integers 


if { [string is integer $c] } { 


setatl ;# toggle START of integer sequence 
if {[string first $lc $pf] != -1 } { append buff $lc } 
append buff $c 
} elseif { [string first $c $if] != -1 } { 
if {$a} { append buff $c } 
} else { 
set a 0 ;# toggle END of integer sequence 
append buff " " 


keep tally for potential prefixes 
set lc $c 


; 


# remove sentence punction and reformat list 
foreach item $buff { lappend res [string trimright $item $pf$if] } 


return $res 


} 


in the following example, one deficiency is evident: An isolated comma or period is not properly 
handled: 
extractNumbers $text; #-> +100 {} -200 6.62607015 -34 1,000 100,000,000.234 34 {} 


1.67262171 -27 .22 
extractNumbers "1/25 3.12344 1046"; #-> 1/25 3.12344 1046 


WJG (13/10/22) Thanks for the comment. Not ‘handling’ isolated commas or periods is not a deficiency 
here. Both would indicate either a malformed sentence or number. 


